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• Abstract 
O ■ 

■ We first show how to construct an 0(n)-depth 0(n)-size quantum circuit for addition of two 
04 \ n-bit binary numbers with no ancillary qubits. The exact size is 7n — 6, which is smaller than 

that of any other quantum circuit ever constructed for addition with no ancillary qubits. Using 
the circuit, we then propose a method for constructing an 0((i(n) )-depth 0(n)-size quantum 
circuit for addition with 0{n/d{n)) ancillary qubits for any d{n) = fl(logn). If we are allowed 
to use unbounded fan-out gates with length 0{n^) for an arbitrary small positive constant e, 
we can modify the method and construct an 0(e(n))-depth 0(n)-size circuit with o{n) ancillary 
qubits for any e{n) = il(log* n). In particular, these methods yield efficient circuits with depth 
(~| i 0{logn) and with depth 0(log* n), respectively. We apply our circuits to constructing efficient 

Qh| quantum circuits for Shor's discrete logarithm algorithm. 

22 '. 1 Introduction 

Since Shor's discovery of quantum algorithms for factoring and discrete logarithm problems [1], 
many studies have investigated ways of constructing quantum circuits for the algorithms [21 [Sj 
^ ■ m O [6l [7]. The resulting circuits are important not only for implementing the algorithms on a 

. quantum computer but also for understanding the computational power of small quantum circuits. 

These studies have shown that addition of two binary numbers is a key operation for constructing 
fSJ ' quantum circuits for Shor's algorithms. 

. We consider the problem of constructing quantum circuits for addition of two binary numbers 

with better complexity. The complexity measures of a quantum circuit are its size and depth, and 
the number of qubits in it. Roughly speaking, the size and depth correspond to computation time, 
while the number of qubits corresponds to the size of memory. We regard the number of qubits as 
a primary consideration since it seems difficult to realize a quantum computer with many qubits. 
^ ' It is not obvious whether the number of qubits in a quantum circuit for addition can be decreased 

■ by using efficient classical ones, though the size or depth can be decreased simply by using them. 
An unbounded fan-out gate on n + 1 qubits copies a classical source bit into n copies. In 

particular, the gate on two qubits is a CNOT gate. If unbounded fan-out gates are available, 
sublogarithmic-depth quantum circuits for various operations can be constructed [8l [9]. This is 
because the gate performs the copy operation on an unbounded number of qubits in a constant 
time. However, it seems difficult to realize such a gate practically. Thus, it is important to minimize 
the number of target qubits of the gate in a circuit without increasing the complexity of the circuit. 
When we use unbounded fan-out gates, we consider the complexity measures (size, depth, and the 
number of qubits) for the number of target qubits of the gate. We call the number of target qubits 
the length of an unbounded fan-out gate. 

There have been many studies of efficient quantum circuits for addition of two n-bit binary 
numbers. These circuits can be classified according to depth complexity. Draper's and Takahashi 
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et al.'s circuits have depth 0(n) and use no ancillary qubits \10\ lllj. Takahashi et al.'s is more 
efficient than Draper's since the sizes of Takahashi et al.'s and Draper's are 0(n) and O(n^), 
respectively. Draper et al.'s and Takahashi et al.'s circuits have depth O(logn) [12\ I13j. Draper 
et al.'s uses 0{n) ancillary qubits and its size is 0{n). Takahashi et al. decreased the number 
of ancillary qubits to 0(n/ log n) without increasing the size asymptotically. H0yer et al. showed 
that, if unbounded fan-out gates with length 0(n) are available, an 0(log* n)-depth circuit can be 
constructed [9j. They have not analyzed the number of ancillary qubits or size. 

In this paper, we first show how to construct an 0(n)-depth 0(n)-size quantum circuit for 
addition with no ancillary qubits. The circuit is based on the ripple-carry approach. The exact 
size is 7n — 6, which is smaller than that of any other quantum circuit ever constructed for addition 
with no ancillary qubits. Moreover, the circuit is more implementable than the previous circuits 
with no ancillary qubits in the sense that the circuit can be used directly on a linear nearest 
neighbor architecture [6], i.e., on a unidimensional array of qubits with nearest neighbor interactions 
only. By combining the circuit with the carry-lookahead approach, we then propose a method for 
constructing an 0{d{n))-depth 0(n)-size quantum circuit for addition with 0{n/d{n)) ancillary 
qubits for any d{n) = J7(logn). The method is a generalized and simplified version of Takahashi 
et al.'s method for constructing a logarithmic-depth circuit with a small number of qubits [13] . In 
particular, for d{n) = logn, our method yields an 0(logn)-depth 0(n)-size circuit with 0(n/ log n) 
ancillary qubits. The number of ancillary qubits is exactly the same as that in Takahashi et al.'s 
circuit and the size is less than half that of Takahashi et al.'s. 

If we are allowed to use unbounded fan-out gates with length O(ra^) for an arbitrary small 
positive constant e, we can modify our method and construct an 0(e(n) )-depth 0(n)-size circuit 
with 0(n log** n/e{n)) ancillary qubits for any e(n) = r2(log* n), where log** n is a slowly-growing 
function satisfying log** n = o(log*n). The main point of this modification is to decrease the 
depth of the carry-lookahead part of our method by using a quantum version of Chandra et al.'s 
constant-depth classical circuit for addition with unbounded fan-in and fan-out gates [13]. To 
construct the quantum version, we require a quantum gate corresponding to an unbounded fan-in 
gate. We use H0yer et al.'s small-depth quantum circuit for a generalized Toffoli operation with 
unbounded fan-out gates [9] as the gate. In particular, for e(n) = log* n, the modified method 
yields an 0(log* n)-depth 0(n)-size circuit with o(n) ancillary qubits. Though Il0yer et al. have 
constructed an 0(log* n)-depth circuit for addition as mentioned above, our construction shows 
that the number of ancillary qubits, size, and the length of an unbounded fan-out gate can be small 
simultaneously. 

This construction also shows that unbounded fan-out gates with a small length are sufficient to 
construct a sublogarithmic-depth circuit. For example, if we are allowed to use unbounded fan-out 
gates with length 0(log n), we can construct an 0(log n/ log log n)-depth 0(n)-size circuit with o(n) 
ancillary qubits. Such a sublogarithmic-depth circuit cannot be constructed by using a quantum 
circuit only with gates on a bounded number of qubits ]13\ or by using a classical circuit only with 
bounded fan- in and unbounded fan-out gates [T6] . 

Using our circuits for addition, we construct efficient quantum circuits for Shor's discrete log- 
arithm algorithm for elliptic curves over the prime field GF(p). This is done by simply using our 
addition circuits in Proos et al.'s circuit for Shor's discrete logarithm algorithm [5]. Since Proos et 
al.'s circuit uses n ancillary qubits during addition, the use of our circuit with no ancillary qubits 
decreases the n ancillary qubits without increasing the original depth or size asymptotically, where 
n is the length of the binary representation for p. Moreover, we decrease the depth asymptotically 
by adding o(n) ancillary qubits. Proos et al.'s circuit with our addition circuits is more efficient 
than with the previous ones described above. 

In contrast to the previous methods for constructing efficient quantum circuits for addition 
|10^ ITTI [T2| [T3l Ej, our method is general in the sense that it can yield various types of efficient 
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quantum circuits for addition. The generality allows us to construct quantum circuits appropriate 
for various situations we will have to consider practically. For example, if we want to save the 
number of qubits, we can obtain a qubit-efficient circuit by setting d{n) = n in our method. We 
can decrease the depth by setting d{n) = logn. Moreover, we can choose an "intermediate" circuit 
by setting d{n) = \fn. 



2 Circuit with Depth 0{n) 
2.1 Ripple-Carry Approach 

We use the standard notation for quantum states and the standard diagrams for quantum circuits 
|17j . As mentioned earlier, the measures of the complexity of a quantum circuit are the number 
of qubits and its size and depth. The meaning of the number of qubits is obvious. The size of a 
circuit is defined as the total number of elementary gates in it. The elementary gates are one-qubit 
unitary gates, CNOT gates, controlled-i?4 gates, and Toffoli gates, where Rt\x) = e^'^*^/^ \x) for 
t > 1 and X £ {0, 1}. In Section 4, we use the gate for an unbounded fan-out operation Ft as an 
elementary gate, where Ft (on t + 1 qubits) is defined as 




for y,Xi G {0, 1}. The symbol © denotes addition modulo 2. The depth of a circuit is defined as 
follows. Input qubits are considered to have depth 0. For each gate G, the depth of G is equal 
to 1 plus the maximal depth of a gate on which G depends. The depth of a circuit is equal to 
the maximal depth of a gate in it. Intuitively, the depth is the number of layers in the circuit, 
where a layer consists of gates that can be performed simultaneously. A quantum circuit can use 
ancillary qubits, which start and end in the state |0). We usually count the number of ancillary 
qubits instead of the number of all qubits used in the circuit. 

We consider the problem of constructing quantum circuits for the operation ADD„ defined as 




where a^-i • • • ao and ■ ■ - bo are the input binary numbers, z G {0, 1}, and • • • sq is the sum 
of the input binary numbers. Our linear-depth circuit and most of the previous ones with a small 
number of qubits are based on the ripple-carry approach. To explain the approach, we define the 
carry bit Cj (0 < i < n) as follows: 

I MAJ(aj_i,&j_i,Ci_i) l<i<n, 

where MAJ is the majority function for three bits defined as MAJ(a, b, c) = a6 © 6c © ca. In the 
ripple-carry approach, the first step is to compute the carry bit ci by using ag and bo and cq. 
Then, C2 is computed by using oi and bi and ci. This procedure is repeated until all carry bits are 
computed. After that, Si {0 < i < n) is computed by the relationship 

f aj © 6i © Cj 0<i<n-l, 
[ Cn i = n. 

When the ripple-carry approach is used, the key issue for constructing a quantum circuit with 
a small number of qubits is how to store carry bits. Cuccaro et al.'s circuits, which are based on 
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Figure 1: The MA J gate. 



the approach, use one ancihary qubit to store cq = [18] . The carry bit q is stored in the qubit 
initially storing aj_i for 1 < i < n. To do this, they defined the gate for MAJ depicted in Fig. [H 
which is the main component of their circuits. The gate maps \ci)\bi)\ai) to \ci © ai)\bi © ai)\ci+i). 
Takahashi et al.'s circuit, which is also based on the ripple-carry approach, uses no ancillary qubits 
All the carry bits are stored in the qubit initially storing z. The main component of their 
circuit is also the MAJ gate. They use the property that the gate maps \z © bi)\z © ai)\z © Cj) to 
\bi © Ci)\ai © Ci)\z © Cj+i). 



2.2 Our Circuit 

We store the carry bit Cj in the qubit initially storing for < i < n — 1 and store the high- 
order bit Cn in the qubit initially storing z. This would be difficult to do if we use the MAJ gate 
directly. Our idea is to divide the MAJ gate into two parts. The first part consists of two CNOT 
gates and the second one consists of one Toffoli gate. It is easy to verify that a Toffoli gate maps 
|6j © ai)|aj © Ci)\ai+i © a,) to |6j © ai)\ai © Ci)\ai+i © Cj+i) for 1 < i < n — 1, where we consider a„ as 
z. Thus, using CNOT gates (the first parts of the MAJ gate) and a Toffoli gate, we first prepare 
the state ^ 

|6i ©ai)|ai ©ci) ^0 \bi ® ai)\ai ® ai^i)^ |z©a„_i). 

By applying Toffoli gates (the second parts of the MAJ gate), we can compute Cj and store it in 
the qubit initially storing Cj. The final Toffoli gate computes Cn and stores it in the qubit initially 
storing z. The detailed construction is described below. 

Let Ai and Bi denote the memory locations initially storing aj and bi, respectively, for < i < 
n — 1. Let An be the memory location initially storing z. Location Ai (0 < i < n — 1) will store 
ai, Bi (0 < i < n — 1) will store Sj, and An will store z © s„ at the end of the computation. Our 
circuit is constructed in the following six steps. 

1. For i = 1, . . . ,n — 1: 

Apply a CNOT gate to a pair of memory locations Bi and Ai where Ai is used for the control 
qubit. 

2. For z = n — 1, . . . , 1: 

Apply a CNOT gate to a pair of memory locations Ai and Ai^i where Ai is used for the 
control qubit. 

3. For i = 0, . . . ,n — 1: 

Apply a Toffoli gate to a tuple of memory locations Bi, Ai and ^j+i, where Bi and Ai are 
used for the control qubit. 

4. For i = n — 1, . . . ,1: 
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Figure 2: The circuit for ADD5. 



Apply a CNOT gate to a pair of memory focations Bi and where Ai is used for the control 
qubit. Then, apply a Toffoli gate to a tuple of memory locations -Bj-i, ^i-i and Aj, where 
-Bj_i and are used for the control qubit. 

5. For i = 1, . . . ,n — 2: 

Apply a CNOT gate to a pair of memory locations Ai and Ai^i where Ai is used for the 
control qubit. 

6. For i = 0, . . . ,n — 1: 

Apply a CNOT gate to a pair of memory locations Bi and Ai where Ai is used for the control 
qubit. 

The circuit for ADD5 is depicted in Fig. ^ 

We describe the changes of the input state of ADD„ to show that the circuit works correctly. 
In Step 1, the input state is transformed into 

\bo)\ao) (^0|&*ea,)|a,)j \z). 

In Step 2, the state is transformed into 

\bo)\ao)\bi ® ai)\ai) ^0 |6i ai)|ai ai_i) j |zean-i>- 

The first Toffoli gate in Step 3 transforms the state into 

\bo)\ao)\bi © ai)\ai © ci) ^0 \bi © ai)\ai © Oi^i) j \z © a„_i). 

This is repeated by using a Toffoli gate. The state after Step 3 is 

\bo)\ao) ^0 \bi © ai)\ai © q)^ \z © s„). 

In Step 4, the state is transformed into 

|^o)|ao)|^i © ci)|ai) ^0 \bi © Ci)|ai © ai_i)j |z © s,,). 



5 



Table 1: Comparison of Our Circuit and Previous Circuits 



Circuit 


Ancilla 


Size 


Toffoli 


Depth 


LNN 


Cuccaro et al. [18J 


1 


6n + 1 


2n 


6n + 1 




Cuccaro et al. [18] 


1 


9n-8 


2n - 1 


2n + 4 




Draper [TO] 





l.Sn^ + 4.5n + 2 





5n + 3 




Takahashi et al. [11] 





lOn-9 


4n - 5 


8n - 7 




Our Circuit 





7n — 6 


2n - 1 


5n — 3 





In Step 5, the state is transformed into 

l^o)|ao) ^0 \bi e Ci)\ai)^ \z s„). 
Since Sj = © ftj © q for < i < n — 1, the final step gives us the desired output state. 
2.3 Complexity Analysis 

From the construction, it is obvious that our circuit uses no ancillary qubits. We compute the 
depth and size of the circuit for n > 3 precisely. In Step 1, the number of CNOT gates is n — 1 and 
these gates can be performed simultaneously. Thus, the depth and size of Step 1 are 1 and n — 1, 
respectively. In Step 2, the number of CNOT gates is n — 1 and thus the depth and size of Step 2 
are n — 1. In Step 3, the number of Toffoli gates is n and thus the depth and size of Step 3 are n. 
In Step 4, the number of CNOT gates is n — 1 and the number of Toffoli gates is n — 1. Thus, the 
depth and size of Step 4 are 2n — 2. In Step 5, the number of CNOT gates is n — 2 and thus the 
depth and size of Step 5 are n — 2. In Step 6, the number of CNOT gates is n and these gates can 
be performed simultaneously. Thus, the depth and size of Step 6 are 1 and n, respectively. Thus, 
the depth and size of the whole circuit are 5n — 3 and 7n — 6, respectively. The numbers of CNOT 
and Toffoli gates are 5n — 5 and 2n — 1, respectively. 

As discussed in [6] , many proposed quantum computer architectures deal with a unidimensional 
array of qubits with nearest neighbor interactions only. Thus, it is important for a circuit to work 
on such a linear nearest neighbor (LNN) architecture. When the input and output binary numbers 
are arranged on an LNN architecture in an interleaved manner (as in Fig. [2]), our circuit can be 
used directly on an LNN architecture in the sense that the circuit can be transformed into one on 
an LNN architecture without increasing the size or depth asymptotically. 

A comparison of our circuit and the previous ones with a small number of qubits is summarized 
in Table [H The symbol "-y/" in the LNN column means that the circuit can be used directly on 
an LNN architecture in the sense described above. The symbol " — " means that we do not know 
whether this is the case for the circuit. The size of our circuit is less than that of any other quantum 
circuit ever constructed for ADD„ with no ancillary qubits. When we regard the number of qubits 
as a primary consideration, our circuit is more efficient than the previous circuits in Tabled! 

Though there exists a size-efficient or depth-efficient circuit with one ancillary qubit [1^, it 
is worth noting that the difference between the total number of ancillary qubits used by parallel 
applications of our circuit (as in the next section) and that of the previous circuit with one ancillary 
qubit depends on the number of circuits applied in parallel and may become large. Moreover, since 
Toffoli gates are on three qubits and thus may be harder to implement than the other gates (on 
a smaller number of qubits), it is worth noting that the number of Toffoli gates in our circuit is 
2n — 1, which is less than or equal to those of the previous circuits in Table [1] (excluding Draper's 
0(n^)-size circuit). 
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3 General Method 



3.1 Combination Method 

The ripple-carry approach decreases the number of ancillary qubits but requires large depth. The 
carry-lookahead approach decreases the depth but requires many qubits [12]. Our method is based 
on the combination of these methods and is a generalized and simplified version of Takahashi et 
al.'s method for constructing a logarithmic-depth circuit with a small number of qubits [13]. In 
this section, we review the previous method. The carry-lookahead approach is described by using 
two bits p[i,j] {1 < i < j < n) and g[i,j] (0 < i < j < n) [12]. The bit is 1 if a carry bit is 

propagated from bit position i to bit position j, and g[i,j] is 1 if a carry bit is generated between 
bit positions i and j. The p[i,j] and g[i,j] are computed by the following relations: 

• For any i such that 1 < i < n — 1, p[i, i + 1] = Oj © 6j. 

• For any i,j such that l<i<i+l<j<n, p[i,j] = p[i, t]p[t,j] for any t satisfying i < t < j. 

• For any i such that < i < n — 1, g[i, i -|- 1] = aj6j. 

• For any i,j such that 0<i<i-|-l< j<n, g[i,j] = g[i,t]p[t,j] © g[t,j] for any t satisfying 
i<t<j. 

It holds that ^[O, j] = Cj for all I < j < n. 

Draper et al.'s quantum carry-lookahead adder first computes p[i,i + 1] (1 < « < n — 1) and 
g[i,i -|- 1] (0 < i < n — 1). Then, it computes g[0,i] (1 < « < n) by successively doubling the 
sizes of the intervals under consideration. Lastly, it computes Sj (0 < i < n), where sq = p[0,l], 
Si = p[i, i -|- 1] © g[0, i] (1 < i < n — 1), and s„ = g[0, n]. The key circuit is the one for the second 
step. We call this circuit the CARRYi gate. In general, the CARRY; gate is a circuit for the 
operation 

[n/2'-iJ-l [n/2'-iJ-l [n/2'-iJ-l [n/2'-iJ-l 

(g) \pi.,[i]) (g) \gi-i[3])^ (g \pi-M (g \g[0,2^-\j + l)]), 

1=1 j=0 i=l j=0 

where 1 < Z < [logn\ - 1, pi_i[i] = p[2^-'^i,2^-'^{i + I)], and gi_i[{\ = g[2^-^i,2^-'^{i + I)] [l3]. The 
CARRY/ gate uses X]l=z^"^ ^(L'^/^*] — 1) ancillary qubits and its depth and size are 0(logn — I) 
and 0(X]l=f ^(L"'/2*J — 1))) respectively. Draper et al.'s quantum carry-lookahead adder uses 
0{n) ancillary qubits and its depth and size are O(logn) and 0(n), respectively. 

In Takahashi et al.'s combination method, the input binary number a„_i • • • ao is divided into 
n/k blocks of length k, where we assume that n is a power of two for simplicity and set k = 2L'°s^°s"J 
and I = [log log nj -|- 1. Note that k = 0(logn) and n is divisible by k. That is, we consider a 
/c-bit binary number a(j) = a(j_|_i)fc_i • • ■ ajk for < j < n/k — 1. Similarly, we consider b{j) for 
bn-i ■ ■ - bo. Roughly speaking, the previous method is described as follows: 

1. Compute the high-order bit of a{j) + b{j), which is gi-i[j] = g[jk, {j + l)k], using the ripple- 
carry approach [llj for < j < n/k — 1. 

2. Compute the value Ai=oici'jk+i © t)jk+i), which is pi-i[j] = p[jk, (j -|- l)k], using Barenco et 
al.'s circuit for a generalized Toffoli operation [19j for < j < n//c — 1, where Tt (on t + 1 
qubits) is defined as 

/ t-1 \ t-1 t-1 

Tt \y) (g \xi) \=\y(B/\ Xi) (g \xi). 

\ 1=0 / i=0 i=0 
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3. Compute the carry bit Cjk = g[0,jk] using the values computed in Steps 1 and 2 for 1 < j < 
n/k. This is done by using the CARRY; gate. 

4. Compute the carry bit ^[0, i] using the carry bits computed in Step 3 for I < i < n and obtain 
Si for < i < n. This is done by a circuit based on the ripple-carry approach as in Step 1. 

The whole circuit uses 0{n/k) (= 0(n/ log n)) ancillary qubits and its depth and size are 0{k) 
(= O(logn)) and 0(n), respectively. 

3.2 Our Method 

Our idea is to divide the input binary numbers into n/d{n) blocks of length d{n) in Takahashi 
et al.'s method, where d{n) = Q.{[ogn). By using the CARRYiog^(„)+i gate, we can construct an 
0((i(n))-depth 0(n)-size circuit with 0{n/ d{n)) ancillary qubits. This is a simple generalization 
of the previous method. Though this allows us to construct an 0((i(n) )-depth circuit for any 
d{n) = r2(logn) in contrast to the previous method, it, of course, does not improve the previous 
0(logn)-depth circuit. 

To obtain an efficient circuit, we simplify Steps 1, 2, and 4 in the previous method using the 
circuit for addition in Section 2. The simplification of Step 4 is due to a direct application of 
the circuit for addition. To simplify Steps 1 and 2, we use only the first halves of our circuit for 
addition and Barenco et al.'s circuit for T„ [19j. The first half of the circuit for addition outputs the 
high-order bit of a{j) + b{j) and appropriate inputs to Barenco et al.'s circuit. We use only the first 
half and we can thus save Toffoli gates, but some qubits represent unuseful values. An important 
point is that Barenco et al.'s circuit can use these qubits as uninitialized ancillary qubits. We use 
the first half of Barenco et al.'s circuit and we can thus again save Toffoli gates, but some qubits 
have unuseful values. This is not a problem since these qubits are reset to the initial values in later 
steps. The details are described below. 

To simplify Steps 1 and 2, since we need to compute only the two bits g[i,j] and p[i,j] for some 
i,j, it suffices to construct an efficient quantum circuit for the operation 



where Uw-i • • • clq and bw-i • • • &o are the input binary numbers, ro = ao, and = ® gf[0, i] ©p[0, i] 
(1 < z < w — 1). Let Ai and Bi denote the memory locations initially storing Oj and bi, respectively. 
Let G and P be the memory locations initially storing 0. Location Ai will store rj, Bi will store 
p[i,i -|- 1], G will store (7[0, tt;], and P will store p[0,ri;] at the end of the computation. The circuit 
is defined as follows: 

1. Apply the first half of the circuit (for two w-hit binary numbers) in Section 2 to a tuple of 
memory locations Ai (0 < i < w — 1) and Bi (0 < i < w — 1) and G. 

2. Apply a CNOT gate to a pair of memory locations and Bq, where is used for the 



3. Apply the first half of Barenco et al.'s circuit for to a tuple of memory locations Ai 
(0 < i < w — 1) and Bi (0 < i < w — 1) and P, where Ai is used as an uninitialized ancillary 
memory location. 

Step 1 writes the value (7[0,w] into the memory location G. The memory location Ai stores the 
value Tj. Step 2 writes p[0, 1] into the memory location Bq. Step 3 uses the memory location Ai as 
an uninitialized ancillary memory location and writes the value p[0, w] into the memory location 




(g) \b,)\ai) |0)|0) ^ (g) \p[i,i + l])\r,) \g[0,w])\p[0,w]) 




control bit. 
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Figure 3: The INIT5 gate. A dashed-line box represents the part for computing (7[0,5], which is 
the first half of our circuit for addition in Section 2. 
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Figure 4: The SUM5 gate. 

P. The whole circuit uses no ancillary qubits and its depth and size are 0{w). We call the circuit 
the INIT^ gate. The INIT5 gate is depicted in Fig. E) 

To simplify Step 4, it suffices to construct an efficient quantum circuit for the operation 

/ w-l \ / w-l \ 

\^\c) |6^)|a^)j ^ (^|c) \U)\a,)j , 

where c E {0,1}, aw~i---ao and • • • 60 are the input binary numbers, tj = aj © bj © 
(0 < j < t« — 1), and dj is defined as 



c j = 0, 

MAJ(aj_i,6j_i,(ij_i) 1 < i < - 1. 



We can directly apply the circuit in Section 2 to constructing such a circuit and thus omit the 
details. The circuit uses no ancillary qubits and its depth and size are O^w). We call the circuit 
the SUM^ gate. The SUM5 gate is depicted in Fig. H 

3.3 The Whole Circuit 

We construct a quantum circuit for ADD„. For simplicity, we assume that n is a power of two. 
Let d{n) = O(logn). We set k = 2Li°g'^(")J and I = [logd{n)\ + 1. Note that k = e((i(n)) and 
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n is divisible by k. As described in Section 3.1, we consider fc-bit binary numbers a(j) and b{j). 
Let Ai and Bi denote the memory locations initially storing Oj and bi, respectively. Let Z be the 
memory location initially storing z G {0, 1}. Location Ai will store a^, Bi will store Sj, and Z will 
store z® Sn sX the end of the computation. We assume that there are ancillary memory locations 
initially storing 0. The first half of our circuit is defined as follows: 

1. Apply the INIT^ gate to memory locations storing a(j) and b{j) and to two ancillary memory 
locations storing for < j < n//c — 1. The gate writes gi-i[j] and Pi-i[j] into the ancillary 
memory locations. 

2. Apply the CARRY^ gate to memory locations storing all gi-i{j] and allp/_i[j] and to ancillary 
memory locations storing 0. The gate writes C(j_|_i)fc into the memory location storing gi-i[j] 
for < j < n/k — 1. 

3. Apply the gates in Step 1 in reverse order, where we exclude the gates applied to memory 
locations storing cq-^^);, for < j < n/k — 1 since we do not erase the value. 

4. Apply the SUM^ gate to memory locations storing a{j + 1) and b{j + 1) and to a memory 
location storing Cf^^j+i) to obtain s^q+i), . . . , Sfc(j_|_2)_i for < j < n/k — 2. Apply a simplified 
gate of the SUM^, gate to memory locations storing a(0) and 6(0) to obtain sq, . . . , s^-i- 

The last half part deletes unnecessary carry bits using the fact that the carry bits generated for 
computing a + s' is the same as those for computing a + 6, where s' is the bitwise complement of s 

5. Apply a NOT gate to Bi to write Sj © 1 into Bi for 0<i<n — k — 1. 

6. Apply the first half of our circuit excluding Step 4 in reverse order, where we exclude the 
gates applied to memory locations storing a{n/k — 1) and b{n/k — 1) since we do not erase the 
last carry bit. The gate writes into a memory location storing Cfc(j+i) for < j < n/k — 1. 

7. Apply a NOT gate to Bi to write Si into Bi for {)<i<n — k — 1. 

The whole circuit for d{n) = logn and n = 8 (and thus A; = / = 2) is depicted in Fig. [5j 

We compute the number of ancillary qubits, the depth, and the size precisely. For simplicity, 
we count only Toffoli gates as in [12l[13]. Step 1 requires ^ ancillary qubits to use ^ INIT^ gates. 
The gate consists of 3n — 2 Toffoli gates for n > 3. Thus, the depth and size of Step 1 are 2>k — 0(l) 
and 3n — 0{n/k), respectively. The CARRY; gate in Step 2 uses ^ — O(logn) ancillary qubits and 
its depth and size are 2 log ^ + 0(1) and ^ + O(logn), respectively, where f > 4 [T^. Step 3 is 
the same as Step 1. Step 4 uses ^ SUM^ gates. The gate consists of 2n — 2 Toffoli gates for n > 3. 
Thus, the depth and size of Step 4 are 2k — 0(1) and 2n — 0{n/k), respectively. The other steps are 
the same as the above steps excluding Step 4. Our circuit uses ^ — O(logn) ancillary qubits and 
its depth and size are 14:k + 41og | + 0(1) and 14n — 0{n/k), respectively, where f > 4. Thus, the 
circuit uses 0{n/d{n)) ancillary qubits and its depth and size are 0{d{n)) and 0(n), respectively. 
For example, for d{n) = logn and n > 16, the number of ancillary qubits, the depth, and the size 
are approximately 3n/logn, 18 logn, and 14n, respectively. The corresponding previous bounds 
are 3n/logn, 30 logn, and 29n. That is, in this case, the number of ancillary qubits in our circuit 
is the same as that in Takahashi et al.'s [13] and the leading coefficient of the expression of the size 
in our circuit is less than half that in Takahashi et al.'s. 
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Figure 5: The circuit for ADDg, where d{n) = log n. The first and third dashed-line boxes represent 
the carry- lookahead part [121 [13] . The second one represents the parallel apphcations of the SUM2 
gate. 



4 Circuit with Depth o(logn) 
4.1 Chandra et al.'s Classical Circuit 

If we use only one-qubit and two-qubit gates as elementary gates, we cannot construct an o(logn)- 
depth circuit for ADD„. This is simply shown by using the logarithmic lower bound for the depth 
of the circuit for |15j . To construct an o(logn)-depth circuit, we decrease the depth of the 
carry- lookahead part of our method in Section 3 by using a quantum version of Chandra et al.'s 
efficient classical circuit for addition with (classical) unbounded fan-out gates [H]. We assume that 
we have unbounded fan-out gates (described in Section 2) as elementary gates. We first consider 
the simple case where we have unbounded fan-out gates with a long length and then reduce the 
length. 

Chandra et al.'s method for constructing the circuit is a generalization of the carry- lookahead 
approach. Besides the (classical) unbounded fan-out gates, the circuit uses unbounded fan-in gates 
that compute logical AND (or OR) of an unbounded number of input bits. The depth and size of 
the circuit for two m-bit binary numbers are 0(1) and 0(m log** m), respectively, where 

j 3 
. " > . * > 

log** t = min{j| log* • • • log* t < 1}, log* t = min{j| log • • • logi < 1}. 

It can be shown that log** m = o(log* m). Though the definition of the depth of a classical circuit 
is similar to that of a quantum circuit, the definition of the size of a classical circuit in [14J is 
different from that of a quantum circuit. More precisely, a classical circuit is defined as a directed 
acyclic graph and the size is the number of edges in the circuit and the depth is the length of a 
longest path from an input node to an output node. Chandra et al. give a tighter bound on the 
size of the circuit, but we use the above bound since it is sufficient for showing that our circuits in 
Sections 4.2 and 4.3 use a sublinear number of ancillary qubits. 
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4.2 Simple Case 

4.2.1 Quantum Version of Chandra et al.'s Circuit 

We transform Chandra et al.'s classical circuit for two m-bit binary numbers into its quantum 
version. Since the size (that is, the number of edges) of the circuit is 0(m log** ni), it suffices to 
consider an unbounded fan-out gate with length 0(mlog**m) and a Tt gate (corresponding to an 
unbounded fan-in gate with t inputs in the classical circuit) with t = 0{m log** m). We assume that 
we have unbounded fan-out gates with length 0(m log** m). If we have one-qubit gates, CNOT 
gates, Tt gates, and unbounded fan-out gates with length 0(m log** m), Chandra et al.'s classical 
circuit can be simply transformed into its quantum version. Note that an OR gate in Chandra 
et al.'s circuit is transformed into a Tt gate with NOT gates. However, in our setting, we have 
only one-qubit gates, CNOT gates, and unbounded fan-out gates with length 0(m log** m). Thus, 
we require a quantum circuit for Tt (consisting of one-qubit gates, CNOT gates, and unbounded 
fan-out gates with length 0(m log** m)). We use H0yer et al.'s circuit for the Tt operation (defined 
in Section 3.1) as the Tt gate [9j. They showed that, if unbounded fan-out gates with length 0{t) 
are available, an 0(log* t)-depth 0(t)-size quantum circuit for Tj can be constructed. We can show 
that H0yer et al.'s circuit uses 0{t) ancillary qubits. Since we have unbounded fan-out gates with 
length 0(m log** m), we can directly use H0yer et al.'s circuit for Tt with t = 0(m log** m). Thus, 
we obtain a quantum version of Chandra et al.'s circuit. We call the circuit the GCLA^ circuit, 
which stands for the generalized carry-lookahead approach for two m-bit binary numbers. 

The complexity of the GCLA^ circuit is analyzed as follows. To compute the depth of the 
circuit, since the depth of the original circuit is 0(1), it suffices to consider a Tj^ gate, where ti is 
the maximum number of inputs of Tt gates in the GCLA^ circuit. The depth of the Tt-^ gate is 
0(log* ti). Since ti = 0(mlog** m), the depth of the Tt^ gate is 0(log*(mlog** m)) and thus the 
depth of the GCLA^ circuit is 0(log*(m log** m)). To compute the size of the circuit, we define 
At as the number of unbounded fan-in gates with t inputs in Chandra et al.'s circuit, which is 
equal to the number of Tt gates in the GCLA^ circuit. Since the size of Chandra et al.'s circuit 
is 0(m log** m), J2t^^t = 0(m log** m). The size of a Tt gate is 0{t). The number of the other 
gates in the GCLA^ circuit is 0(m log** m) (and the size of each gate is 1). Thus, the size of 
the GCLAm circuit is 0Q2tt^t + mlog** m) = 0(m log** m). A similar argument shows that the 
number of ancillary qubits in the GCLAm circuit is 0(m log** m). That is, the GCLA^ circuit 
uses 0(m log** m) ancillary qubits and its depth and size are 0(log*(mlog** m)) and 0(m log** m), 
respectively. 

4.2.2 Modification of Our Method 

We modify our method in Section 3.3 by using the GCLA^ circuit as the CARRY; gate. Let 
e(n) = Q{log* n). We set k and / as in Section 3.3. Note that k = 2'~^ = 0(e(n)). We assume that 
we are allowed to use unbounded fan-out gates with length 0{n). Chandra et al.'s circuit for two 
[n/2'^^J-bit binary numbers is directly applied to perform the operation performed by the CARRY; 
gate. Thus, we set m = \n/2^~^\. In this case, 0(mlog**m) = 0(nlog**(n/2'~^)/2'~^), which is 
bounded by 0{n). Since we have unbounded fan-out gates with length 0(n), we can use the com- 
plexity analysis described in Section 4.2.1. The GCLA^ circuit, which is the CARRY; gate, uses 
0(n log** (n/2'~^)/2'~-'^) ancillary qubits and its depth and size are 0(log*(nlog**(n/2'^-'^)/2'~^)) 
and 0(nlog**(n/2'~^)/2'~-^), respectively. For simplicity, we consider slightly weaker bounds; 
it uses 0(n log** n/2'~^) ancillary qubits and its depth and size are 0(log*(n log** n/2'~^)) and 
0(nlog** n/2'^-'^), respectively. 

The complexity of the whole circuit obtained by the modified method is analyzed as in the orig- 
inal method. Step 1 uses 0{n/k) ancillary qubits and its depth and size are 0{k) and 0(n), respec- 
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tively. Step 2 uses O {n log** n/k) ancillary qubits and its depth and size are 0(log*(n log** n/Zc)) 
and 0{nlog** n/k), respectively. Step 4 requires no new ancillary qubits and its depth and size 
are 0{k) and 0(n), respectively. The other steps are similar to the above steps. Thus, the whole 
circuit uses 0(n log** n/e(n)) (= o(n)) ancillary qubits and its depth and size are 0(e(n)) and 
0(n), respectively. In particular, for e(n) = log* n, the modified method yields an 0(log* n)-depth 
0(n)-size circuit with 0(n log** n/ log* n) (= o(n)) ancillary qubits. 

4.3 Reduction of the Length of an Unbounded Fan-Out Gate 

We prove that the length of an unbounded fan-out gate can be restricted to 0{n'^) in the modified 
method without increasing the complexity of the circuit, where e is any small positive constant. 
Suppose that wc arc allowed to use unbounded fan-out gates with length /(n). An unbounded 
fan-out gate with length t = 0(mlog**m) (and m = [n/2''~^\) can be simply simulated by using 
an 0(log t/ log /(n) + l)-depth 0{t/ f{n) + l)-size circuit with no ancillary qubits that consists only 
of unbounded fan-out gates with length /(n). In the following, using this simulation, we reconsider 
the complexity of the Tt gate, the GCLA^ circuit, and the circuit our method in Section 4.2 yields. 

4.3.1 Tt gate 

The Tt gate, which is H0yer et al.'s circuit for the Tt operation, is constructed as follows: 

1. Construct an 0(l)-depth 0(tlogt)-size circuit with 0{t\ogt) ancillary qubits for reducing 
the computation of OR of t bits to that of O(logi) bits. 

2. Using the circuit in Step 1, for any d > 0, construct an 0{d + log* f)-depth 0((it log^'^-' t)- 
size circuit for Tt with 0{dt\og^'^^ t) ancillary qubits, where log^'^^ t is the d-times iterated 
logarithm log • • • log t. 

3. Using the circuit in Step 2, construct an 0(log* i)-depth 0(t)-size circuit for Tt with 0{t) 
ancillary qubits. 

We can modify the above steps using unbounded fan-out gates with length f{n) as follows: 

1. Construct an 0(log t/ log f{n) + l)-depth 0{t log t)-size circuit with 0{t log t) ancillary qubits 
for reducing the computation of OR of t bits to that of O(logt) bits. 

2. Using the circuit in Step 1, for any d > 0, construct an 0{d + log* t + log t/ log /(n) + 
dlog log t/ log /(n))-depth 0(dt log*-'^^ t)-size circuit for Tt with 0{dt\og^'^'' t) ancillary qubits. 

3. Using the circuit in Step 2, construct an 0(logi/log/(n) -|- log* t)-depth 0(t)-size circuit for 
Tt with 0{t) ancillary qubits. 

To see this, we first analyze Step 1 in H0yer et al.'s construction. In this step, an unbounded fan- 
out gate with length 0(logt) is used in parallel to make O(logi) copies of each of the t input bits. 
Moreover, an unbounded fan-out gate with length 0{t) is used in parallel to prepare appropriate 
ancillary qubits O(logt) times. As described above, an unbounded fan-out gate with length O(logt) 
can be simulated by using an 0(log log t/ log /(n) -|- l)-depth 0(logt//(n) -|- l)-size circuit with no 
ancillary qubits. Similarly, an unbounded fan-out gate with length 0{t) can be simulated by 
using an 0(logt/log /(n) -|- l)-depth 0{t/f{n) + l)-size circuit. Thus, the depth of the Tt gate is 
0(logi/log/(n) + 1). The size is 0{t ■ (logt//(n) + 1) + (logt) • + 1)) = O(ilogt). These 

simulations do not require any ancillary qubits. That is, in Step 1, the number of ancillary qubits 
and size remain unchanged even if we consider unbounded fan-out gates with length /(n). Thus, 
they also do so in Steps 2 and 3. Step 2 of H0yer et al.'s construction is done by using Step 1 
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0(log* t) times to reduce the computation of OR of t bits to that of a constant number of bits. Step 
3 is done by reducing the computation of OR of t bits to that of t/log*t bits and by using Step 
2 with d = log* t. These procedures can be simply applied to the case where we use unbounded 
fan-out gates with length /(n) and imply the desired depth bound. 

4.3.2 The GCLA„ circuit 

To compute the depth of the GCLAm circuit, it suffices to consider a Tf^ gate for some ti and an 
unbounded fan-out gate with some length t2- The depth of the T^^ gate is 0(log ti/ log /(n)-|-log* ti) 
and the depth of an unbounded fan-out gate with length t2 is 0(log log /(n) -|- 1). Since ti and 
t2 cannot be greater than the size of Chandra et al.'s circuit, the depth of the GCLAm circuit is 
0(logm/log/(n)-|-log*(mlog** m)). To compute the size, we define i?^ as the number of unbounded 
fan-out gates with length t used (implicitly) in Chandra et al.'s original circuit, which is equal to 
the number of unbounded fan-out gates with length t (that are not used in Tg gates for any s) in the 
GCLAm circuit. Since the size of Chandra et al.'s circuit is 0(m log** m), J2t ~ 0(mlog** m). If 
t > /("-)) a-ii unbounded fan-out gate with length t can be simulated by an 0(i//(n))-size circuit. 
Thus, the size related to unbounded fan-out gates with length greater than or equal to f{n) in 
the GCLAm circuit (that is, Y^t>f{n){^/f{^))^t) is 0(m log** m) since J2t^Bt = 0(m log** m). 
The size related to the Tt gates (that is, 0{J2ttAt)) is 0(m log** m). The number of the other 
gates is 0(m log** m) (and the size of each gate is 1). Thus, the size of the GCLAm circuit is 
0(m log** m). The number of ancillary qubits is the same as the size. That is, the GCLAm circuit 
uses 0(mlog** m) ancillary qubits and its depth and size are 0(log m/ log /(n) -|- log*(mlog** m)) 
and 0(mlog**m), respectively. Since m = [n/2''^^\, the circuit uses 0(n log** (n/2'^-'^)/2'^-'^) 
ancillary qubits and its depth and size are 0(log(n/2'~-^)/ log /(n) -|-log*(nlog**(n/2'^^)/2'^^)) and 
0(nlog**(n/2'~^)/2'~^), respectively. For simplicity, we consider slightly weaker bounds; it uses 
0{n log** n/2'~^) ancillary qubits and its depth and size are 0(log n/ log /(n)-|-log*(n log** n/2'~^)) 
and 0(n log** n/2'~^), respectively. 

4.3.3 Our Circuit 

We set /(n) = and use the GCLAm circuit as the CARRY^ gate, where e is any small positive 
constant. In this case, the CARRY/ gate uses 0(n log** n/2'~^) ancillary qubits and its depth and 
size are 0(log*(n log** n/2'~"^)) and 0(n log** n/2'~^), respectively. This is the same situation as 
that in Section 4.2 except that the length of an unbounded fan-out gate in the CARRY/ gate is 
at most n^. Thus, the whole circuit uses 0(n log** n/e(n)) (= o(n)) ancillary qubits and its depth 
and size are 0(e(n)) and 0(n), respectively. If we set e(n) = log* n, we obtain an 0(log* n)-depth 
0(n)-size circuit with o(n) ancillary qubits. 

It is worth noting that the above method for constructing a circuit for ADD„, yields an o(log n)- 
depth 0(n)-size circuit with o(n) ancillary qubits using unbounded fan-out gates with a small 
length. For example, we set f{n) = logn and d{n) = log n/ log log n. In this case, the CARRY/ 
gate uses 0(n log** n log log n/ log n) ancillary qubits and its depth and size are 0(logn/ log logn) 
and 0(nlog** n log logn/ logn), respectively. This yields an 0(logn/loglogn)-depth 0(n)-size 
circuit with 0(n log** n log logn/ logn) ancillary qubits. Such an o(logn)-depth circuit cannot be 
constructed by using a quantum circuit only with gates on a bounded number of qubits [T5] or 
by using a classical circuit only with bounded fan-in and unbounded fan-out gates [16j. Hence, 
unbounded fan-out gates even with a small length are useful for constructing efficient quantum 
circuits for addition. 
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5 Application 



We consider the prime field GF(p) for some prime p > 3. An elliptic curve E over GF(p) is the set 
of points {x, y) G G¥{p) x G¥{p) satisfying y"^ = + ax + b, where the constants a,b £ GF(p) and 
4a^ + 276^ 7^ 0, together with the point at infinity O. It is known that the addition operation in E 
can be defined and that E with the addition operation forms an abelian group with O serving as 
its identity [20]. Let P e E, {P) be the subgroup of E generated by P, and |(P)| be the order of 
(P). The discrete logarithm problem over the elliptic curve E with respect to the base P is defined 
as follows: Given a point Q G (P), find the integer < d < \{P)\ — 1 such that Q = dP. Shor's 
discrete logarithm algorithm solves the problem in time polynomial in the length of the binary 
representation for \ {P)\ with high probability [1]. As in [5], we assume that the length of the binary 
representation for \{P)\ is equal to that of the binary representation for p. 

Proos et al. constructed an efficient quantum circuit for Shor's discrete logarithm algorithm 
for elliptic curves over GF(p) [5]. Let n be the length of the binary representation for p. The 
depth and size of the circuit are O(n^). The dominant cost is 0{n'^) applications of an 0(n)-depth 
0(n)-size quantum circuit for ADD„ with n ancillary qubits. For counting the number of qubits 
in the circuit, it suffices to count the number of qubits in the circuit for division in GF{p) that 
maps \x)\y) to \x)\y/x) for x (7^ 0),y E GF(p). The circuit for division in GF{p) uses about 5n 
qubits: 2n qubits are used for the input register and about 3n qubits are used in the circuit for 
the extended Euclidean algorithm. In the circuit for the extended Euclidean algorithm, about 2n 
qubits are used for the input binary numbers and intermediate results, and n qubits are used for 
ancillary qubits during ADD„. 

By simply replacing Proos et al.'s circuit for ADD„ with our circuit in Section 2, we can eliminate 
the n ancillary qubits during ADD„ since our circuit for ADD„ does not use any ancillary qubits. 
The resulting circuit uses about 4n qubits. Since Proos et al. do not describe the precise depth 
or size of their circuit for ADD^, we cannot compare the depth or size of the resulting circuit 
with that of the original one precisely. However, the depth and size of our circuit for ADD^ are 
asymptotically the same as those of Proos et al.'s. Thus, the depth and size of the resulting circuit 
are asymptotically the same as those of the original circuit. 

By adding o(n) ancillary qubits to the circuit obtained above, we can decrease the depth asymp- 
totically. As shown in Section 3, for any d{n) = O(logn), we have an 0{d{n))-depth 0(n)-size 
circuit for ADD„ with 0{n/d{n)) ancillary qubits. If we use this circuit as above, we obtain 
0(?7-^(i(n))-depth 0(n^)-size circuit for Shor's discrete logarithm algorithm with An + 0{n/ d{n)) 
qubits. Moreover, as shown in Section 4, if we are allowed to use unbounded fan-out gates with 
length 0{rf) for an arbitrary small positive constant e, we have an 0(e(n))-depth 0(n)-size circuit 
for ADD„ with o(n) ancillary qubits for any e(n) = J7(log*n). This circuit yields an 0(n^e(n))- 
depth 0(n^)-size circuit for Shor's discrete logarithm algorithm with An + o{n) qubits. We can also 
use the previous circuits for ADD„ to improve Proos et al.'s circuit. However, they do not yield 
more efficient quantum circuits for Shor's discrete logarithm algorithm than our circuits described 
above. This is simply because our circuits for ADD„ is more efficient than the previous ones. 

6 Conclusions and Future Work 

We constructed an 0(n)-depth 0(n)-size quantum circuit for ADD„ with no ancillary qubits. The 
size is less than that of any other quantum circuit ever constructed for ADD„ with no ancillary 
qubits. Using the circuit, we proposed a method for constructing a small-size quantum circuit for 
ADD„ with a small number of qubits that has a given depth. In particular, we showed that, if we are 
allowed to use unbounded fan-out gates with length 0{n^) for an arbitrary small positive constant 
e, we can construct an 0(log* n)-depth 0(n)-size circuit with o(n) ancillary qubits. We applied 
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our circuits to constructing efficient quantum circuits for Shor's discrete logarithm algorithm. 

Interesting challenges would be to find ways of improving the quantum circuits described in this 
paper. For example, can we construct an 0(log n)-depth 0(n)-size quantum circuit for ADD„ with 
0(1) ancillary qubits? Can we construct an 0(l)-depth 0(n)-size quantum circuit for ADD„ with 
0(n) ancillary qubits using unbounded fan-out gates? In the classical case, we cannot construct 
an 0(l)-depth 0(n)-size (that is, the number of edges) circuit for addition with unbounded fan-in 
and fan-out gates |21j . 
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